Sequential Weighting Algorithms for Multi-Alphabet Sources∗

نویسندگان

Tjalling J. Tjalkens

Yuri M. Shtarkov

Frans M.J. Willems

چکیده

The sequential Context Tree Weighting procedure [6] achieves the asymptotically optimal redundancy behavior of k/2 · (log n)/n, where k is the number of free parameters of the source and n is the sequence length. For FSMX sources with an alphabet A, this number, k, is (|A| − 1) · |Ka|. However, especially the FSMX sources often use only a few possible letters in every state. It would be nice if the linear term in the redundancy was determined by the number of letters actually used in the states in stead of the alphabet size. We shall discuss sequential methods achieving this redundancy behavior. First we describe a ‘natural’ use of the weighting principle on an appropriate class of sources and next we introduce a binary derived model where we use an appropriate estimator.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Count-Based Frequency Estimation with Bounded Memory

Count-based estimators are a fundamental building block of a number of powerful sequential prediction algorithms, including Context Tree Weighting and Prediction by Partial Matching. Keeping exact counts, however, typically results in a high memory overhead. In particular, when dealing with large alphabets the memory requirements of count-based estimators often become prohibitive. In this paper...

متن کامل

Sparse Sequential Dirichlet Coding

This short paper describes a simple coding technique, Sparse Sequential Dirichlet Coding, for multi-alphabet memoryless sources. It is appropriate in situations where only a small, unknown subset of the possible alphabet symbols can be expected to occur in any particular data sequence. We provide a competitive analysis which shows that the performance of Sparse Sequential Dirichlet Coding will ...

متن کامل

Implementing the Context Tree Weighting Method for Text Compression

Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showed practical implementation using not block probabilities but conditional probabilities, it is use...

متن کامل

Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve state-ofthe-art performance in prediction and lossless compression benchmarks. We show that our new bound for this...

متن کامل

On-line string matching algorithms: survey and experimental results

In this paper we present a short survey and experimental results for well known sequential string matching algorithms. We consider algorithms based on different approaches including classical, suffix automata, bit-parallelism and hashing. We put special emphasis on algorithms recently presented such as Shift-Or and BNDM algorithms. We compare these algorithms in terms of the number of character...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

Sequential Weighting Algorithms for Multi-Alphabet Sources∗

نویسندگان

چکیده

منابع مشابه

Count-Based Frequency Estimation with Bounded Memory

Sparse Sequential Dirichlet Coding

Implementing the Context Tree Weighting Method for Text Compression

Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

On-line string matching algorithms: survey and experimental results

عنوان ژورنال:

اشتراک گذاری